2022-04-04

Syllabus

  • Static word embeddings
    • Frequency based methods, word2vec, GloVe, fastText, evaluation of embeddings
  • Contextual word embeddings
    • ELMo, Transformers and attention, BERT, sentence embeddings, contrastive learning
  • Additional topics
    • Geometry of the embedding space, bias, sentiment, multilingual embeddings
  • Topological data analysis
    • Hyperbolic embeddings, singularities and topological polysemy

Motivation: Winograd schemas

  • The trophy doesn’t fit into the brown suitcase because it’s too large.
  • The trophy doesn’t fit into the brown suitcase because it’s too small.

Task: Co-reference resolution

Motivation: Winograd schemas

  • The city councilmen refused the demonstrators a permit because they feared violence.
  • The city councilmen refused the demonstrators a permit because they advocated violence.

Task: Co-reference resolution

  • easy for humans to solve
  • difficult for computers
    • solution relies on real-world knowledge and common sense reasoning

Motivation: Winograd schemas

  • I put the cake away in the refrigerator. It has a lot of butter in it.
  • I put the cake away in the refrigerator. It has a lot of leftovers in it.

Motivation: Garden-path sentences

  • The old man the boat.

  • The complex houses married and single soldiers and their families.
  • The horse raced past the barn fell.

Methods

Some of the word vectors from a 100 dimensional fastText embedding trained on a Wikipedia corpus; projected to 2 dimensions using t-SNE.

Applications of word embeddings

  • Word-sense induction (WSI) or word-sense discrimination: task is the identification of the senses/meanings of a word
  • Output: clustering of contexts of the target word, or a clustering of words related to the target word

Example:

  • target word “cold”
  • collection of sentences:
    • “I caught a cold.”
    • “The weather is cold.”
    • “The ice cream is cold.”

Output: ?

  • Word-sense disambiguation (WSD): relies on a predefined sense inventory, and the task is to solve the ambiguity in the context
  • Output: identifying which sense of a word is used in a sentence

Part-of-speech tagging

  • grammatical tagging: decide which part of speech (noun, verb, article, adjective, preposition, pronoun, adverb, conjunction, and interjection) a word in a text corpus belongs to

PoS might depend both on definition of the word and its context

  • in language a large portion of word-forms are ambiguous
  • example from Wikipedia:
    • “dogs” usually is a plural noun,
    • but can also be a verb as in the sentence “The sailor dogs the hatch.”
  • example where order matters:
    • “can of fish”
    • “we can fish”

Sub-categories for PoS tagging:

  • for nouns, the plural, possessive, and singular forms can be distinguished.
  • “case” (role as subject, object, etc.), grammatical gender, and so on
  • verbs are marked for tense, aspect, and other things

Other tagging tasks:

Text classification

  • Document classification: spam / not spam

  • Review classification: positive / negative

  • Sentiment: positive / neutral / negative

  • single-label classification / multi-label classification

Generative and Discriminative Models

  • Generative models:
    • learn undelying data distribution \(P(x, y) = P(x | y) \cdot P(y)\)
    • prediction: given an input \(x\), pick a class with the highest joint probability \(y = \mathop{\mathrm{argmax}}_{k} P(x | y = k) \cdot P(y = k)\)
      • maximum a posteriori (MAP) estimate
  • Discriminative models:
    • learn the boundaries between classes (i.e. learn how to use the features)
    • prediction: given an input \(x\), pick a class with the highest conditions probability \(y = \mathop{\mathrm{argmax}}_{k} P(y = k | y)\)
      • Maximum Likelihood Estimate (MLE) of parameters

TODO: How to do prediction

Bag of Words (BoW) assumption: word order does not matter

TODO

Static word embeddings

TODO

Frequency based methods

TODO

word2vec, GloVe, fastText

word2vec: (Mikolov, Chen, et al. 2013; Mikolov, Sutskever, et al. 2013)

Contextual word embeddings

  • I’m going to the bank to withdraw some money.
  • We’re sitting on the river bank with some friends.

Recurrent methods: ELMo

Transformers

(Vaswani et al. 2017)

Bidirectional Encoder Representations from Transformers (BERT)

Huggingface transformers

Sentence embeddings

  • Sentence-BERT
    • sentence-pair regression tasks like semantic textual similarity (STS)

Geometry of the embedding space

TODO

Bias

(Bolukbasi et al. 2016)

Sentiment

(Yu et al. 2017)

Multilingual embeddings

Hyperbolic embeddings

(Nickel and Kiela 2017)

  • Poincaré GloVe (Tifrea, Bécigneul, and Ganea 2018)

Hyperbolic image embeddings (Khrulkov et al. 2019)

Singularities and Topological Data Analysis (TDA)

  • manifold hypothesis does not hold at all points of certain static word embeddings

(Jakubowski, Gasic, and Zibrowius 2020)

  • topological polysemy: count the number of “meanings” around a singularity

Thank you!

Organisation

  • Schedule: See Google sheet
  • Each week talks by students (1 or 2 speakers per session, 70 minutes in total)
    • there should be enough time for questions and a discussion
  • Guest lecture ?
  • The final grade is based on your presentation
  • Hand in your extended abstract (ideally .tex, .bib files and compiled .pdf; maximum 2 pages with references) via ILIAS

References

Bolukbasi, Tolga, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam Kalai. 2016. “Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings.” CoRR abs/1607.06520. http://arxiv.org/abs/1607.06520.

Conneau, Alexis, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2017. “Word Translation Without Parallel Data.” CoRR abs/1710.04087. http://arxiv.org/abs/1710.04087.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” CoRR abs/1810.04805. http://arxiv.org/abs/1810.04805.

Jakubowski, Alexander, Milica Gasic, and Marcus Zibrowius. 2020. “Topology of Word Embeddings: Singularities Reflect Polysemy.” In Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics, 103–13. https://arxiv.org/abs/2011.09413.

Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language Processing. MIT Press.

Khrulkov, Valentin, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan V. Oseledets, and Victor S. Lempitsky. 2019. “Hyperbolic Image Embeddings.” CoRR abs/1904.02239. http://arxiv.org/abs/1904.02239.

Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. 2015. “Bilingual Word Representations with Monolingual Quality in Mind.” In NAACL Workshop on Vector Space Modeling for NLP. Denver, United States.

Mikolov, Tomás, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, edited by Yoshua Bengio and Yann LeCun. http://arxiv.org/abs/1301.3781.

Mikolov, Tomás, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” CoRR abs/1310.4546. http://arxiv.org/abs/1310.4546.

Nickel, Maximilian, and Douwe Kiela. 2017. “Poincar é Embeddings for Learning Hierarchical Representations.” CoRR abs/1705.08039. http://arxiv.org/abs/1705.08039.

Tifrea, Alexandru, Gary Bécigneul, and Octavian-Eugen Ganea. 2018. “Poincar é GloVe: Hyperbolic Word Embeddings.” CoRR abs/1810.06546. http://arxiv.org/abs/1810.06546.

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” CoRR abs/1706.03762. http://arxiv.org/abs/1706.03762.

Yu, Liang-Chih, Jin Wang, K. Robert Lai, and Xuejie Zhang. 2017. “Refining Word Embeddings for Sentiment Analysis.” In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 534–39. Copenhagen, Denmark: Association for Computational Linguistics. https://doi.org/10.18653/v1/D17-1056.